Variable selection strategies for nearest neighbor imputation methods used in remote sensing based forest inventory
نویسندگان
چکیده
We examined the problem of selecting predictor variables for Nearest Neighbor (NN) imputation in remote sensing based forest inventory. Eighty-three variables were calculated from Airborne Laser Scanning data and aerial images, with responses being either dominant height or a set of five common stand attributes. Three different approaches were compared with select predictor variables. Analyses were repeated with three different NN imputation methods using a varying number of predictor variables. Results indicated that variable selection is justified, but it must be done properly. The most accurate method to select predictors was to minimize error using Simulated Annealing. For a single response, the most accurate imputation method was Random Forest proximity matrix-based imputation, whereas Most Similar Neighbor was the most accurate for five responses. An optimization-based distance metric also worked well. We also examined the degree to which different imputation methods are prone to overfitting as well as how to properly do crossvalidation in NN imputation. Résumé. On a examiné la problématique de la sélection des variables prédictives dans la procédure d’imputation par la méthode du plus proche voisin dans le contexte des inventaires forestiers réalisés par télédétection. Quatre-vingt trois variables ont été calculées à partir de données SLA (scanneur laser aéroporté) et d’images aériennes, les réponses étant soit la hauteur dominante ou un ensemble de cinq attributs courants de peuplement. Trois approches différentes ont été comparées pour la sélection des variables prédictives. Les analyses ont été répétées à l’aide de trois méthodes différentes d’imputation par le plus proche voisin en utilisant un nombre variable de variables prédictives. Les résultats ont montré que la sélection variable est justifiée, mais que celle-ci doit être faite correctement. La méthode la plus précise pour sélectionner les variables prédictives consistait à minimiser l’erreur à l’aide de la technique de recuit simulé. Pour une réponse unique, la méthode d’imputation la plus précise était l’imputation basée sur la matrice de proximité de type « Random Forest » (forêt aléatoire) alors que la méthode la plus précise pour les cinq réponses était la méthode d’imputation par le voisin le plus semblable « Most Similar Neighbor ». Une mesure de distance basée sur une méthode d’optimisation a également donné de bons résultats. On a aussi étudié la propension des différentes méthodes d’imputation au sur-ajustement de même que la façon d’exécuter correctement une validation croisée dans le contexte de l’imputation par le plus proche voisin. [Traduit par la Rédaction]
منابع مشابه
Evaluation Accuracy of Nearest Neighbor Sampling Method in Zagross Forests
Collection of appropriate qualitative and quantitative data is necessary for proper management and planning. Used the suitable inventory methods is necessary and accuracy of sampling methods dependent the inventory net and number of sample point. Nearest neighbor sampling method is a one of distance methods and calculated by three equations (Byth and Riple, 1980; Cotam and Curtis, 1956 and Cota...
متن کاملEvaluation Accuracy of Nearest Neighbor Sampling Method in Zagross Forests
Collection of appropriate qualitative and quantitative data is necessary for proper management and planning. Used the suitable inventory methods is necessary and accuracy of sampling methods dependent the inventory net and number of sample point. Nearest neighbor sampling method is a one of distance methods and calculated by three equations (Byth and Riple, 1980; Cotam and Curtis, 1956 and Cota...
متن کاملyaImpute: An R Package for kNN Imputation
This article introduces yaImpute, an R package for nearest neighbor search and imputation. Although nearest neighbor imputation is used in a host of disciplines, the methods implemented in the yaImpute package are tailored to imputation-based forest attribute estimation and mapping. The impetus to writing the yaImpute is a growing interest in nearest neighbor imputation methods for spatially ex...
متن کاملK-Nearest Neighbor Method for Classification of Forest Encroachment by Using Reflectance Processing of Remote Sensing Spectroradiometer Data
This study gives sophisticated result in the use of K-Nearest Neighbor Method classification of forest. The major focus is on the data and technique that can be used to identify the changes in forest features. This study will concentrate on identifying forest encroachment in tropical forests such as the forests of Malaysia. This technique study will establish a strong mechanism that can be used...
متن کاملThe roles of nearest neighbor methods in imputing missing data in forest inventory and monitoring databases
Almost universally, forest inventory and monitoring databases are incomplete, ranging from missing data for only a few records and a few variables, common for small land areas, to missing data for many observations and many variables, common for large land areas. For a wide variety of applications, nearest neighbor (NN) imputation methods have been developed to fill in observations of variables...
متن کامل